A New Approach to Search Result Clustering and Labeling
نویسندگان
چکیده
A NEW APPROACH TO SEARCH RESULT CLUSTERING AND LABELING Anıl Türel M.S. in Computer Engineering Supervisor: Prof. Dr. Fazlı Can August, 2011 Search engines present query results as a long ordered list of web snippets divided into several pages. Post-processing of information retrieval results for easier access to the desired information is an important research problem. A post-processing technique is clustering search results by topics and labeling these groups to reflect the topic of each cluster. In this thesis, we present a novel search result clustering approach to split the long list of documents returned by search engines into meaningfully grouped and labeled clusters. Our method emphasizes clustering quality by using cover coefficient and sequential k-means clustering algorithms. Cluster labeling is crucial because meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able to label clusters effectively, a new cluster labeling method based on term weighting is introduced. We also present a new metric that employs precision and recall to assess the success of cluster labeling. We adopt a comparative evaluation strategy to derive the relative performance of the proposed method with respect to the two prominent search result clustering methods: Suffix Tree Clustering and Lingo. Moreover, we perform the experiments using the publicly available Ambient and ODP-239 datasets. Experimental results show that the proposed method can successfully achieve both clustering and labeling tasks.
منابع مشابه
An Integer Programming Model and a Tabu Search Algorithm to Generate α-labeling of Special Classes of Quadratic Graphs
First, an integer programming model is proposed to find an α-labeling for quadratic graphs. Then, a Tabu search algorithm is developed to solve large scale problems. The proposed approach can generate α-labeling for special classes of quadratic graphs, not previously reported in the literature. Then, the main theorem of the paper is presented. We show how a problem in graph theory c...
متن کاملImproved Automatic Clustering Using a Multi-Objective Evolutionary Algorithm With New Validity measure and application to Credit Scoring
In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank...
متن کاملA New Method for Clustering Wireless Sensor Networks to Improve the Energy Consumption
Clustering is an effective approach for managing nodes in Wireless Sensor Network (WSN). A new method of clustering mechanism with using Binary Gravitational Search Algorithm (BGSA) in WSN, is proposed in this paper to improve the energy consumption of the sensor nodes. Reducing the energy consumption of sensors in WSNs is the objective of this paper that is through selecting the sub optimum se...
متن کاملTabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach
The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...
متن کاملبازشناسی جلوههای هیجانی با استفاده از تحلیل تفکیک پذیری مبتنی بر خوشه بندی چهره
Improvement of Facial expression recognition is aim of proposed method. This is a new formulation to the linear discriminant analysis. In the new formulation within-class and between-class covariance matrix are estimated on the each cluster and in the test phase new samples are mapped to the subspace that is related to the cluster of them. At the first we addressed clustering analysis of faces ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011